feat(seinode): in-place TLS enable on Running SeiNodes#264
Closed
bdchatham wants to merge 3 commits into
Closed
Conversation
Builds on #258's externalized-cert foundation. Operators can now add spec.sidecar.tls.secretName to an existing Running SeiNode; the controller composes a NodeUpdate-style plan that cycles the pod with the proxy container mount. Disable and swap remain CEL-blocked. API: - Relax spec.sidecar.tls CEL: allow nil → set transition, still block set → nil and set → different. Set-once semantics; immutability is only loosened in the direction that has a coherent rollout path. - Add Status.CurrentSidecarTLSSecretName: live transport-mode source of truth, distinct from the platform-tooling contract published in Status.SidecarTLS. Transport-mode consistency: - SidecarURLForNode now reads from Status.CurrentSidecarTLSSecretName rather than Spec.Sidecar.TLS. Mid-rollout (spec set, pod not yet cycled), the controller's sidecar client keeps using HTTP until ObserveSidecarTLS stamps the status field — this is the windowing bug that bit PR #254. Test covers the mid-rollout state explicitly. Planner: - sidecarTLSEnableDrift: narrow detector (spec.tls set + status CurrentSidecarTLSSecretName empty + preflight Ready). - buildRunningPlan composes image-drift + TLS-enable drift into one pod cycle. - buildNodeUpdatePlan emits ApplyRBACProxyConfig pre-task + the ObserveSidecarTLS observer when TLS-enable drift fires; co-composes with ObserveImage when both drift. - Init plans (buildBasePlan + buildBootstrapPlan + buildGenesisPlan) inject ObserveSidecarTLS between ApplyStatefulSet/Service and the first sidecar HTTP task so the transport-mode flip happens before ConfigApply / GenerateIdentity / etc. fire. Task: - New ObserveSidecarTLS task. Polls the StatefulSet for rollout convergence (mirrors ObserveImage), then stamps Status.CurrentSidecarTLSSecretName from spec. - Registered in task.go deserializer + paramsForTaskType. classifyPlan emits new labels: tls-enable, node-update+tls-enable. Tests: - Drift detector matrix (fires / converged / not-ready / disabled). - Plan composition across (image-only, tls-only, both). - Init plan ObserveSidecarTLS placement before the first sidecar HTTP task (validates the transport-mode-flip ordering invariant). - SidecarURLForNode mid-rollout regression: spec-set + status-empty keeps HTTP. - Condition lifecycle on TLS-enable plans. Closes #262. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Blockers: - cmd/main.go: newSidecarClient now keys the TLS Doer on Status.CurrentSidecarTLSSecretName (not spec). Symmetric source of truth with SidecarURLForNode. Closes the mid-rollout window where mTLS-attempting transport would hit plain HTTP and the bootstrap-pod Phase-2 case where the bootstrap pod has no proxy. k8s-flagged blocker — corrected platform's "non-blocker" framing. - LLD §0.1(b), §4, §5, §6 rewritten to reflect set-once semantics (enable-on-existing allowed; disable + swap CEL-rejected). Doc contradiction with the implementation was the platform + security blocker; design history captured so future readers see the spec is set-once-not-immutable and why. - Document load-bearing trust assumption: Status.CurrentSidecarTLS- SecretName drives controller transport; manifests/role.yaml scopes seinodes/status patch to the controller SA only. Followup tracked. Non-blockers / nits: - ObserveSidecarTLS now requires ReadyReplicas >= *Replicas (k8s). Stricter than ObserveImage: stamping status flips SidecarURLForNode to HTTPS, so a crashlooping proxy must not flip transport prematurely. New test covers the crashloop scenario. - classifyPlan label node-update+tls-enable → node-update-tls-enable (security). Hyphen avoids PromQL/dashboard regex-escape footguns. - sidecarTLSEnableDrift now has an inline nil-guard (k8s). Removes implicit dependency on SidecarTLSEnabled invariants. - buildBasePlan: consolidate two adjacent SidecarTLSEnabled blocks with ordering comment (platform). - buildBootstrapPlan: clarify ObserveSidecarTLS comment now that the Phase-2 (bootstrap-pod, HTTP) vs Phase-5 (production-pod, HTTPS) split is explicit (k8s). - Test comment contradiction fixed (k8s). - New SidecarURLForNode test: status-set-without-spec quadrant. Deferred (tracked): - ValidatingAdmissionPolicy on seinodes/status writes (security #263). - envtest harness for CEL transitions (k8s #264). - Shared rollout-readiness primitive ObserveImage/ObserveSidecarTLS (platform — defer to when a third observer lands). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Strip rationale paragraphs, cross-references, historical framing, and restatements of what the code obviously does. Keep present-tense one-liners where the WHY is non-obvious. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
You have used all Bugbot PR reviews included in your free trial for your GitHub account on this workspace. To continue using Bugbot reviews, enable Bugbot for your team in the Cursor dashboard. |
Collaborator
Author
|
Closing — directional pivot. We're removing sidecar TLS management from the controller entirely. Rationale: controller↔sidecar runs on the private K8s cluster network; TLS termination doesn't earn its keep here. kube-rbac-proxy stays for its SAR-based authz value (in Subtraction PR incoming. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
spec.sidecar.tls.secretNameto an existing Running SeiNode; the controller composes a NodeUpdate-style plan that cycles the pod with the kube-rbac-proxy mount.status.currentSidecarTLSSecretNamerather than spec.Closes #262.
Design
spec.sidecar.tlsnil → setallowed;set → nil(disable) andA → B(swap) rejectedStatus.CurrentSidecarTLSSecretNameObserveSidecarTLSafter rollout convergence (ReadyReplicas >= *Replicas)SidecarURLForNode+cmd/main.go newSidecarClientsidecarTLSEnableDriftbuildNodeUpdatePlanObserveSidecarTLSbetween StatefulSet apply and first sidecar HTTP taskclassifyPlanlabelstls-enable,node-update-tls-enableLLD §0.1(b), §4, §5, §6 rewritten in this PR to reflect set-once semantics — the original LLD said immutable; un-deferred per operator validation that per-node delete+recreate is more friction than warranted.
Cross-review
Trio (platform / kubernetes / security) ran against
77a28c1. Three commits address findings:77a28c1— implementatione28c92b— round-1 fixups (cmd/main.go transport gate, ReadyReplicas check, classifyPlan separator, nil-guard, LLD rewrite)1f33673— comment sweepBlockers addressed:
cmd/main.go newSidecarClientwas gated on spec whileSidecarURLForNodereads status — divergence would break mid-rollout (in-place enable) and Phase-2 bootstrap (mTLS doer against HTTP bootstrap pod). Switched gating predicate toStatus.CurrentSidecarTLSSecretName != "".seinodes/statuspatch RBAC is scoped to the controller SA only. Documented as load-bearing trust assumption in LLD §0.1(b). Defense-in-depth admission policy tracked as Harden status.currentSidecarTLSSecretName writes via ValidatingAdmissionPolicy #263.Non-blocker fixups applied:
ObserveSidecarTLSnow requiresReadyReplicas >= *Replicas(k8s) — stricter thanObserveImagebecause stamping flips transportclassifyPlanlabelnode-update+tls-enable→node-update-tls-enable(security; avoids PromQL regex footgun)sidecarTLSEnableDrifthas explicit nil-guard (k8s)buildBasePlanadjacentSidecarTLSEnabledblocks consolidated (platform)SidecarURLForNodetests for both mid-rollout quadrantsDeferred (tracked):
seinodes/statuswrites (defense in depth)Test plan
make test— full suite greengolangci-lint run --new-from-rev=origin/main ./...— 0 issuesmake manifests generate— CRD diff is additive (new field optional, CEL change is a relaxation)sidecarTLSEnableDriftmatrix (fires / converged / not-ready / disabled)SidecarURLForNodemid-rollout quadrants (spec-set/status-empty stays HTTP; status-set drives HTTPS)ObserveSidecarTLScrashlooping-proxy scenario (UpdatedReplicas=1, ReadyReplicas=0 → stays Running)classifyPlannew labelsspec.sidecar.tls.secretNameon a Running SeiNode with a pre-applied Secret; pod cycles; serves on:8443References
docs/design-seinode-sidecar-tls-toggle-lld.md— LLD revised in this PR🤖 Generated with Claude Code